Phrase Based Document Retrieving by Combining Suffix Tree index data structure and Boyer- Moore faster string searching algorithm
نویسنده
چکیده
Phrase has been considered as a more informative feature term for improving the effectiveness of document retrieval .This paper propose an Algorithm A Phrase Based Document Retrieval to retrieve the similar documents by combining two exiting algorithm suffix tree ,index data structure and “The Boyer-Moore Algorithm”, faster string searching algorithm. The suffix tree is constructed based on E. Ukkonen, “on-Line Construction Of Suffix Trees For Strings, a most efficient string-matching algorithm. On the constructed suffix ,”The Boyer-Moore Algorithm” is applied to check the presence of pattern i.e. the input phrase in order and without order to retrieve the similar documents. Furthermore, by studying the property of suffix tree and Boyer-Moore, we conclude that suffix tree data structure store huge documents and Boyer-Moore algorithm checks the presence of pattern fastly. This conclusion sufficiently explains why the Phrase Based Document Retrieval works much better than the other document retrieval.
منابع مشابه
Skriptum VL Text Indexing
In this section we will introduce suffix trees, which, among many other things, can be used to solve the string matching task (find pattern P of length m in a text T of length n in O(n + m) time). We already know that other methods (Boyer-Moore, e.g.) solve this task in the same time. So why do we need suffix trees? The advantage of suffix trees over the other string-matching algorithms (Boyer-...
متن کاملSkriptum VL Text-Indexierung
In this section we will introduce suffix trees, which, among many other things, can be used to solve the string matching task (find pattern P of length m in a text T of length n in O(n+m) time). In the exercises, we have already seen that other methods (Boyer-Moore, e.g.) solve this task in the same time. So why do we need suffix trees? The advantage of suffix trees over the other string-matchi...
متن کاملSpace-efficient Data Structures for String Searching and Retrieval
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 The Models of Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
متن کاملA Space-Efficient Implementation of the Good-Suffix Heuristic
We present an efficient variation of the good-suffix heuristic, firstly introduced in the well-known Boyer-Moore algorithm for the exact string matching problem. Our proposed variant uses only constant space, retaining much the same time efficiency of the original rule, as shown by extensive experimentation.
متن کاملAdapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts
In this paper we propose an efficient approach to the compressed string matching problem on Huffman encoded texts, based on the Boyer-Moore strategy. Once a candidate valid shift has been located, a subsequent verification phase checks whether the shift is codeword aligned by taking advantage of the skeleton tree data structure. Our approach leads to algorithms that exhibit a sublinear behavior...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014